196-2007: Pipes and Threads: Performance Testing of Advanced Scalability Features in SAS®9

نویسندگان

  • Gerhardt Pohl
  • Fred Forst
  • Mario Widel
  • Thomas Burger
  • Eli Lilly
چکیده

SAS v9 provides many new features of interest to users with large data sets. Many of the commonly used procedures are multi-treaded. This allows users to harness the full power of multi-processor architectures to enhance the efficiency of some tasks. In addition, SAS/Connect allows users to spawn multiple independent processes to execute custom parallel processing solutions. SAS/Connect also enables pipeline parallelism, i.e., piping of records from one DATA/PROC step to another between processes. This potentially relieves some of the I/O burden of the total SAS job, an important consideration when dealing with large data sets. We examined the real world speed of these three features using two different multiprocessor system architectures, a Sun Microsystems Sun Fire 3800 running Unix and an IBM 9672/Y36 running MVS. We investigated a typical usage pattern with a simple DATA step followed by a PROC TABULATE. Test data were from a simulated insurance claims file with test runs ranging from 10 million to approximately 322 million records. Our tests suggest that for an architecture based on traditional synchronous processing using the traditional method of the DATA step writing intermediate output to disk, SAS v9 is not substantially different than SAS v8. However, the enhanced features in SAS v9, piping and multi-threading, yield significantly shorter total run times on more powerful architectures. These performance gains were evident at all file sizes tested (1.2 to 38.4 GB). We noted also that the built-in multi-threading in PROC TABULATE performed better than a manual parallel processing solution. This is an important finding since it suggests that shorter run times can be achieved directly with the base product without the added expense of an additional complicated programming effort. INTRODUCTION The introduction of SAS v9 marked a major upgrade in the SAS computing system with many features targeting scalability for large datasets. Many procedures have been enhanced with the ability to perform parallel processing. This includes some of the most frequently used procedures (MEANS, REPORT, SORT, SQL, SUMMARY, TABULATE, GLM, REG). Introduced in V8, SAS/CONNECT gives you the ability to exploit SMP (Symmetric Multi-Processing) hardware as well as network resources to perform parallel processing and easily coordinate all the results into a single client SAS session. In v9, SAS/CONNECT supports pipeline parallelism, which allows multiple DATA steps or procedures to execute in parallel and to pipe the output from one process as the input to the next process in a pipeline. Piping improves performance and reduces the demand for disk space. SYSTEM ARCHITECTURES In our testing we used two basic architectures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SUGI 28: An Inside Look at Version 9 and 9.1 Threaded Base SAS(r) Procedures

This paper looks at the changes in key SAS Base procedures which now incorporate threading to achieve significant performance improvements on SMP architectures. The paper is relevant to the following SAS procedures: SORT, SUMMARY, MEANS, TABULATE, REPORT and SQL. We look at software scalability in general and specifically how threading has been utilized in these procedures to enhance scalabilit...

متن کامل

SUGI 27: Up and Out: Where We're Going with Scalability in SAS(r) Version 9

This paper gives an overview of the ways that SAS is addressing performance through scalability in SAS Version 9. Scalability features have been implemented in many areas of SAS Version 9 to allow your applications to scale up and scale out. These include: • Multi-Process (MP) CONNECT, • the Scalable Performance Data Engine (SPDE engine), • certain SAS/ACCESS engines, • several scalable SAS pro...

متن کامل

SUGI 28: Developing Client/Server Applications to Maximize SAS 9 Parallel Capabilities

Parallel SAS processes, multiple threads within a single SAS process, parallel SAS servers, distributed computing, grid computing, cluster computing – how might all of these choices affect the way that you develop your client/server SAS applications? This paper will address some of these areas and what you should understand when you develop or run client/server SAS applications in these environ...

متن کامل

SUGI 27: SAS(r) Meets Big Iron: High Performance Computing in SAS(r) Analytical Procedures

Version 9 targets the heavy-duty analytic procedures in SAS® for high performance computing enhancements. These enhancements encompass both algorithmic improvements and modifications to exploit multiprocessor hardware. This paper provides a survey of this development and the performance gains obtained in several procedures in SAS/STAT and Enterprise Miner. Some general scalability issues are ...

متن کامل

Scalability of the SAS/STAT HPGENSELECT High-Performance Analytical Procedure: A comparison with RevoScaleR

Effectively implementing high-performance analytics software solutions in the insurance industry Executive Summary At the Strata Conference on October 25, 2012, the research and planning division of a large insurance corporation (hereafter " insurer ") presented various methods that they used to model 150 million observations of insurance data. A summary of their presentation is available at: ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007